|
|
Combinatorial Semi-supervised Incremental Support Vector Machine Learning Algorithm |
GUO Husheng1,2, WANG Wenjian1, 2, PAN Shichao1 |
1.School of Computer and Information Technology, Shanxi University, Taiyuan 030006 2.Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education,Shanxi University, Taiyuan 030006 |
|
|
Abstract Incremental support vector machine (ISVM) has difficulty in selecting the best incremental sample during each incremental learning step, and therefore the generalization performance of the model is weak. To solve this problem, combinatorial semi-supervised incremental support vector machine learning algorithm (ICS3VM) is proposed. The best incremental sample is selected by combinatorial labeling of the large scale unlabeled samples in batches. The most valuable unlabeled samples in the classification margin are added into the training set each time to correct the model. Meanwhile, the label with the largest margin is regarded as the final label to ensure the accuracy. The experiment on the standard datasets shows the good generalization performance and the high learning efficiency of the proposed ICS3VM.
|
Received: 19 May 2015
|
Corresponding Authors:
WANG Wenjian Corresponding author, born in 1968, Ph.D., professor. Her research interests include computational intelligence and data mining.
|
About author:: GUO Husheng, born in 1986, Ph.D., lecturer. His research interests include machine learning and data mining.PAN Shichao, born in 1987, master student. Her research interests include machine learning and intelligent computing. |
|
|
|
[1] 刘建伟,刘 媛,罗雄麟.半监督学习方法.计算机学报, 2015, 38(8): 1592-1617. (LIU J W, LIU Y, LUO X L. Semi-supervised Learning Methods. Chinese Journal of Computers, 2015, 38(8): 1592-1617.) [2] SHAHSHAHANI B M, LANDGREBE D A. The Effect of Unlabeled Samples in Reducing the Small Sample Size Problem and Mitigating the Hughes Phenomenon. IEEE Trans on Geoscience and Remote Sensing, 1994, 32(5): 1087-1095. [3] 周志华,王 珏.机器学习及其应用.北京:清华大学出版社, 2007. (ZHOU Z H, WANG J. Machine Learning and Its Application. Beijing, China: Tsinghua University Press, 2007.) [4] NIGAM K, MCCALLUM A K, THRUN S, et al. Text Classification from Labeled and Unlabeled Documents Using EM. Machine Lear-ning, 2000, 39(2): 103-134. [5] ZHU X J, GHAHRAMANI Z, LAFFERTY J I. Semi-supervised Learning Using Gaussian Fields and Harmonic Functions // Proc of the 20th International Conference on Machine Learning. Menlo Park, USA: AAAI Press, 2003: 912-919. [6] BLUM A, MITCHELL T. Combining Labeled and Unlabeled Data with Co-training // Proc of the 11th Annual Conference on Computational Learning Theory. New York, USA: ACM, 1998: 92-100. [7] JOACHIMS T. Transductive Inference for Text Classification Using Support Vector Machines // Proc of the 16th International Confe-rence on Machine Learning. San Francisco, USA: Morgan Kaufmann Publishers, 1999: 200-209. [8] CHAPELLE O, ZIEN A. Semi-supervised Classification by Low Density Separation // Proc of the 10th International Workshop on Artificial Intelligence and Statistics[EB/OL].[2015-04-22] .http://www.gatsby.ucl.ac.uk/aistats/fullpapers/198.pdf. [9] COLLOBERT R, SINZ F, WESTON J, et al. Large Scale Transductive SVMs. Journal of Machine Learning Research, 2006, 7:1687-1712. [10] SINDHWANI V, KEERTHI S S, CHAPELLE O. Deterministic Annealing for Semi-supervised Kernel Machines // Proc of the 23rd International Conference on Machine Learning. New York, USA: ACM, 2006: 841-848. [11] CHAPELLE O, CHI M M, ZIEN A. A Continuation Method for Semi-supervised SVMs // Proc of the 23rd International Conference on Machine Learning. New York, USA: ACM, 2006: 185-192. [12] DE BIE T, CRISTIANINI N. Semi-supervised Learning Using Semi-definite Programming // CHAPELLE O, SCHELKOPF B, ZIEN A, eds. Semi-supervised Learning. Cambridge, USA: MIT Press, 2006: 119-135. [13] GHASSABEH Y A, RUDZICZ F, MOGHADDAM H A. Fast Incremental LDA Feature Extraction. Pattern Recognition, 2015, 48(6): 1999-2012. [14] 谢志强,辛 宇,杨 静.基于信号驱动的多批处理综合调度算法. 计算机学报, 2013, 36(4): 818-828. (XIE Z Q, XIN Y, YANG J. Multi-batch Processing Integrated Scheduling Algorithm Based on Signal Driven. Chinese Journal of Computers, 2013, 36(4): 818-828.) [15] NIKITIDIS S, NIKOLAIDIS N, PITAS L. Multiplicative Update Rules for Incremental Training of Multiclass Support Vector Machines. Pattern Recognition, 2012, 45(5): 1838-1852. [16] 顾 彬,郑关胜,王建东.增量和减量式标准支持向量机的分析. 软件学报, 2013, 24(7): 1601-1613. (GU B, ZHENG G S, WANG J D. Analysis for Incremental and Decremental Standard Support Vector Machine. Journal of Software, 2013, 24(7): 1601-1613.) [17] 赵 莹.半监督支持向量机学习算法研究.博士学位论文.哈尔滨:哈尔滨工程大学, 2010. (ZHAO Y. Research on Semi-supervised Support Vector Machine Learning Algorithm. Ph.D Dissertation. Harbin, China: Harbin Engineering University, 2010.) [18] CHAPELLE O, SINDHWANI V, KEERTHI S S. Optimization Techniques for Semi-supervised Support Vector Machines. Journal of Machine Learning Research, 2008, 9: 203-233. |
|
|
|